install.packages("tidyverse") # install a package
library(tidyverse) # load the library / packagetidyverse: a collection of packages that work very well togetherggplot2Do cars with big engines use more fuel than cars with small engines?
mpg data frame (displ: size, hwy: car’s fuel effiency)head(mpg)## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.80 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.80 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2.00 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2.00 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.80 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.80 1999 6 manu… f 18 26 p comp…
ggplot2: default templateggplot2 plot followsggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))ggplot() creates a coordinate system that you can add layers todata: the dataset to be used in the graphGEOM_FUNCTION: adds a layer to the plotmapping and aes: defines how variables in your dataset are mapped to visual propertieshwy against displggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))aes()ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = class))size, alpha or shapeggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")Can specify
mpg, identify which variables are categorical and which are continuousaesggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = cty))ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = year))facet_wrap()~ followed by a variable name – necessarily discrete (but not necessarily categorical)!ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)facet_grid()ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
ggplot2 is only about adding layers to a plotggplot2 to represent datahwy against displ, this time using geom_smooth instead of geom_pointggplot(data = mpg) + geom_smooth(mapping = aes(x = displ, y = hwy))## `geom_smooth()` using method = 'loess'
ggplot2 is only about adding layers to a plotggplot2 to represent dataggplot2ggplot2 will group the data for these geoms whenever used with a discrete variable, without adding a legend or distinguishing featuresggplot(data = mpg) + geom_smooth(mapping = aes(x = displ, y = hwy, group = drv))## `geom_smooth()` using method = 'loess'
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))## `geom_smooth()` using method = 'loess'
geom_point(mapping = aes(x = displ, y = hwy)) + geom_smooth(mapping = aes(x = displ, y = hwy))ggplot()ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point() + geom_smooth()ggplot2 will extend or overwrite the global mappings for that layer onlyggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()## `geom_smooth()` using method = 'loess'
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)## Warning: package 'bindrcpp' was built under R version 3.3.2
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
# first plot
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
geom_smooth(se = FALSE)
# second plot
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = drv)) +
geom_smooth(mapping = aes(linetype = drv), se = FALSE)diamonds dataset, with variables s.a. price, carat, color, clarity, cut…geom_barggplot(data = diamonds) + geom_bar(mapping = aes(x = cut))stat associated to geom_barggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))ggplot2 provides over 20 stats (e.g. stat_smooth is the stat associated to geom_smooth)geom_col do?ggplot(data = diamonds) + geom_col(mapping = aes(x = cut, y = carat))groupggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop..))
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = "x"))ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = cut))
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity))The stacking is performed by the position adjustment argument, specified by position
position = "identity": overlaps the barsposition = "fill": works like stacking, but makes each bars the same heighposition = "dodge": places overlapping objects beside one anotherg <- ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity))
g + geom_bar(alpha = 1/5, position = "identity")
g + geom_bar(fill = NA, position = "identity")
g + geom_bar(position = "fill")
g + geom_bar(position = "dodge")Using geom_boxplot, play with the position argument. Specifically, based on the mpg dataframe, do a boxplot of hwy on class, differenting by drv.
fill, dodge and ìdentitiy work?g <- ggplot(data = mpg, mapping = aes(x = class, y = hwy, fill = drv))
g + geom_boxplot()
g + geom_boxplot(position = "identity")Others:
coord_flip(): switches the x and y axescoord_polat(): switches to polar coordinates# switches x and y
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot() + coord_flip()# polar coordinates
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut), show.legend = FALSE) +
labs(x = NULL, y = NULL) +
coord_polar()